Goto

Collaborating Authors

 bidirectional momentum update


BMU-MoCo: BidirectionalMomentumUpdate forContinualVideo-LanguageModeling

Neural Information Processing Systems

Different from the original MoCo [19] and its cross-modal versions [15, 33, 35] that utilize momentum update for only momentum encoders to maintain a large consistent queue, our BMU strategy imposes momentum update on both momentum encoders and (video/text) encoders.


BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Neural Information Processing Systems

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data. In this work, we thus propose a continual video-language modeling (CVLM) setting, where models are supposed to be sequentially trained on five widely-used video-text datasets with different data distributions. Although most of existing continual learning methods have achieved great success by exploiting extra information (e.g., memory data of past tasks) or dynamically extended networks, they cause enormous resource consumption when transferred to our CVLM setting. To overcome the challenges (i.e., catastrophic forgetting and heavy resource consumption) in CVLM, we propose a novel cross-modal MoCo-based model with bidirectional momentum update (BMU), termed BMU-MoCo. Concretely, our BMU-MoCo has two core designs: (1) Different from the conventional MoCo, we apply the momentum update to not only momentum encoders but also encoders (i.e., bidirectional) at each training step, which enables the model to review the learned knowledge retained in the momentum encoders.


BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling - Supplementary Material - Yizhao Gao

Neural Information Processing Systems

We provide the pseudocode of our BMU-MoCo in Algorithm 1. Algorithm 1 Pseudocode of BMU-MoCo. The R@5 results and its corresponding FR/HM are reported. The memory data are simply used as training samples in the training process. The model architecture is exactly the same as Base-MoCo. Collecting highly parallel data for paraphrase evaluation.



BMU-MoCo: Bidirectional Momentum Update for Continual Video-Language Modeling

Neural Information Processing Systems

Video-language models suffer from forgetting old/learned knowledge when trained with streaming data. In this work, we thus propose a continual video-language modeling (CVLM) setting, where models are supposed to be sequentially trained on five widely-used video-text datasets with different data distributions. Although most of existing continual learning methods have achieved great success by exploiting extra information (e.g., memory data of past tasks) or dynamically extended networks, they cause enormous resource consumption when transferred to our CVLM setting. To overcome the challenges (i.e., catastrophic forgetting and heavy resource consumption) in CVLM, we propose a novel cross-modal MoCo-based model with bidirectional momentum update (BMU), termed BMU-MoCo. Concretely, our BMU-MoCo has two core designs: (1) Different from the conventional MoCo, we apply the momentum update to not only momentum encoders but also encoders (i.e., bidirectional) at each training step, which enables the model to review the learned knowledge retained in the momentum encoders.